Environmental robust features for speech detection
نویسندگان
چکیده
In this paper, two novel features, Line Spectrum Center Range and Line Spectrum Flux, both derived from Line Spectrum Frequencies, are proposed to detect the presence of speech in various acoustic environments. Evaluation results using Fischer Discriminant Analysis and Scatter Matrices indicated that the new features excel the state-of-theart features. An environmental robust hybrid feature set including the proposed features, Normalized Energy Dynamic Range and Mel-Frequency Cepstrum Coefficients is further introduced. When evaluating the hybrid feature set on a Gaussian Mixture Model based classification engine, the results showed that the hybrid feature set outperformed MelFrequency Cepstrum Coefficients up to in terms of relative frame error rate.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملSpeech Modulation Features for Robust Nonnative Speech Accent Detection
In this paper, we propose to use speech modulation features for robust nonnative accent detection. Modulation spectrum carries long term temporal information of speech and may discriminate accents of native and nonnative speakers. For each speech segment to be tested, we extract a 10 dimension feature vector from modulation spectrum and use it for model training and testing. The proposed modula...
متن کاملSpeech activity detection on youtube using deep neural networks
Speech activity detection (SAD) is an important first step in speech processing. Commonly used methods (e.g., frame-level classification using gaussian mixture models (GMMs)) work well under stationary noise conditions, but do not generalize well to domains such as YouTube, where videos may exhibit a diverse range of environmental conditions. One solution is to augment the conventional cepstral...
متن کاملMissing features detection and handling for robust speaker verification
This paper addresses the problem of robust textindependent speaker verification in the presence of missing (masked by noise) features. It presents and assesses several missing feature handling approaches. In these approaches, the speech enhancement and missing feature detection are based on the minimum mean-square error (MMSE) spectral amplitude estimator of Ephraim and Malah [1].
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کامل